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We present cluster Monte Carlo algorithms for the XYZ quantum spin mod- 
els. In the special case of S = 1/2, the new algorithm can be viewed as 
l_ j a cluster algorithm for the 8-vertex model. As an example, we study the 

00 5 = 1/2 XY model in two dimensions with a representation in which the 

quantization axis lies in the easy plane. We find that the numerical autocor- 
relation time for the cluster algorithm remains of the order of unity and does 

j- not show any significant dependence on the temperature, the system size, or 

the Trotter number. On the other hand, the autocorrelation time for the con- 
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ventional algorithm strongly depends on these parameters and can be very 



large. The use of improved estimators for thermodynamic averages further 
enhances the efficiency of the new algorithms. 
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I. INTRODUCTION 



Recently, researchers have emphasized the importance of global updating in Monte Carlo 
methods. Especially for the critical phenomena and low-temperature behavior of models for 
condensed matter, the difficulty due to diverging numerical auto-correlation times is univer- 
sal. A cluster algorithm based on the Fortuin-Kasteleyn (FK) [1] percolation representation 
was proposed by Swendsen and Wang [2], and was proven to be very powerful in reducing 
the autocorrelation time. This cluster algorithm is a successful example of a global updating 
Monte Carlo method. However, the Swendsen- Wang (SW) algorithm can apply only to the 
Ising model and its generalization is not straight-forward. 

In our previous paper [3], which we will refer to as I, we generalized the FK representation 
to the XXZ quantum spin systems. We also demonstrated [4] that the new cluster algorithm 
can be faster than the conventional algorithm by several orders of magnitude. In the special 
case of spin models with S = 1/2, the new algorithms are equivalent to the cluster algorithms 
for the 6-vertex model proposed in [5] because the S = 1/2 quantum XX Z model can 
be mapped to a 6-vertex model. A similar cluster algorithm was developed also for the 
Hubbard model [6]. It is shown in I that the new cluster representation analogous to the 
FK representation is available to any model described by the XX Z Hamiltonian regardless 
of the magnitude of the spins. 

In this paper, we discuss the details of the FK-type representation and cluster algorithms 
for the quantum spin models which were omitted in I, namely, the model without the 
rotational symmetry with respect to the quantization axis. Therefore, this paper, together 
with I, completes the generalization of the SW-type cluster algorithm to the most general 
quantum spin Hamiltonian, i.e., the XY Z model Hamiltonian. 

In order to show the potential efficiency of the new algorithm, we performed simulations 
of the XY model taking the quantization axis in the easy-plane. 

II. THE OUTLINE OF THE SIMULATION 

The XY Z Hamiltonian we consider is 

n = -J2(Jo + J*s?s* + j v sys* + j a s;s;) (s* = s(s + (2.1) 

We do not assume here any special geometrical feature for the underlying lattice. The 
discussion given below holds for any lattice. Although the constant J is physically irrelevant, 
we include it to make the subsequent discussion look simpler. Here, we assumed that the 
coupling constants do not depend on the sites i or j. The generalization to inhomogeneous 
cases, however, is straight-forward. If J x = J y , we call the model a XXZ model. The 
algorithms for the XY Z models described in this paper is essentially the same as the ones 
for XXZ model in I, except that we use different ways (i.e., local graphs) for breaking-up 
plaquettes and different probabilities for the graph assignment (i.e., labeling probabilities). 
For the completeness, we will briefly repeat the mathematical background of the simulation. 
First, we "divide" each spin whose magnitude is S into 2S Pauli matrices: 

-1 2S 

S ?=oE<» {« = x,y,z). (2.2) 
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Note that we are expanding the Hilbert space by replacing the original 2S + 1 dimensional 
spin space for each spin by the 2 2S dimensional space for 2S Pauli spins each of which 
corresponds to a spin of the magnitude 1/2. In particular, for most states in the new 
Hilbert space, S? ^ S(S + 1) at some lattice point i. Therefore, such states should be 
excluded in computing the partition function. 

As the representation basis for this new Hilbert space, we take simultaneous eigenfunc- 
tions of the z-components of all the Pauli matrices. We designate these basis vectors with 
the symbol ri\. The symbol rii, therefore, stands for a set of 2SN one bit variables where 
N is the total number of lattice points in the original lattice. Then, the partition function 
of the original problem can be written as 

£ = E<S 

Here, P is the projection operator to the subspace in which S? = S(S + 1) for all i. 

In order to evaluate each term in (2.3), we have to compute the matrix elements of the 
Boltzmann operator multiplied by the projection operator. This task is, however, practically 
impossible for large systems. Therefore, we usually use Suzuki-Trotter decomposition [7] to 
map the problem into a classical problem. Accordingly, the lattice we will work on is not the 
original lattice on which the quantum problem is defined. Instead, we will consider many 
layers, each of which is geometrically equivalent to the original lattice, and will take this set 
of layers as a hyper-lattice which has dimension one greater than the original dimension. 
In what follows we call the hyper-lattice simply the lattice. We specify a lattice point in 
this hyper-lattice by a set of two indices, e.g., {k,i) where i specifies the lattice point in 
the original lattice and k specifies the layer to which the point belongs. Since 2SN one-bit 
variables are defined on each layer, the state of the system is described by 2SMN one-bit 
variables where M is the number of layers. To make it easier to visualize the situation, 
we imagine that each of these one-bit variables is defined on a vertex. In other words, 2S 
vertices are associated with each lattice point. We specify a vertex by three indices, e.g., 
{k,i,[i). We write the whole set of variables as n. The hyper-lattice can also be viewed as 
a collection of "shaded" plaquettes. Here a plaquette is a set of four lattice points {k,i), 
(k,j), {k + and {k + where i and j are nearest neighbors in the original lattice. 
A plaquette is also a set of 8S vertices. The use of the word "shaded" originated in the 
fact that plaquettes on which the four body interactions are defined are shaded or hatched 
in almost all the previous pictorial representations of the hyper lattice to distinguish them 
from other plaquettes each of which is merely a set of four nearest neighbor lattice points. 
In what follows, we call a "shaded" plaquette simply a plaquette. Now, our problem can be 
written as 

Z = J2I[sgn(n(p))w(n(p)) (2.4) 
n p 

where the product is taken over the set of plaquettes and n(p) is the subset of n whose 
elements are included in the plaquette p. Namely, for a plaquette p = {{k,i),{k,j),{k + 
l,i),(k + l,j)}, 

n(p) = {n(fc,;,i),?7.(fc,;,2), • • • , ™(fc,;,2S), n (k,j,i), n (k,j,2), • • • , ™(fc,j,2S), 

n {k+l,i,l)i n {k+l,i,2)i ' ' ' j n (k+l,i,2S), n (k+l,j,l), n (k+l,j,2), ' ' ' j n (k+l,j,2S)} ■ (2-5) 



p e -Wp 



[2.3) 



3 



The weight w(n(p)) and sgn(n(p)) are the absolute value and the sign of the local Boltzmann 
weight. Here, the local Boltzmann weight of the classical problem is a matrix element of an 
operator Pe A P where A is defined by 

A = E A ^, (2-6) 

A„,„ = K + K^a^ + K^y^ + K z a^al v . (2.7) 

The constants K a (a = x,y,z) in general depend on the plaquette and are related to the 
coupling constants J a in such a way that the sum of K a 's for all plaquettes with which both 
i and j are associated equals f3J a where (3 is the inverse temperature. 

In the paper I, we argued that once we obtain a set of coefficients v(g) > that satisfy 

w(n( P )) = 52v(g)A(n(p),g), (2.8) 

we can obtain a cluster algorithm. The resulting algorithm is characterized by the probability 
for the graph assignment 

p(g\n(p)) = v(g)A(n(p))/w(n(p)). (2.9) 

In (2.8), r is a set of graphs which depends on the magnitude of spins and the anisotropy 
of the model. 

The graph g is defined on the plaquette p and the function A(n(p),g) takes only two 
values and 1. A graph consists of edges and vertices where an edge is an object which 
connects two vertices. Each edge has a color, green or red. Like vertices, edges are introduced 
here just for making visualization and description of the algorithm easier. The function 
A(n(p),g) is defined as 



1 (If every green edge connects vertices with the same value, 

and every red edge connects vertices with different values.) . (2.10) 
(Otherwise) 



Now, the problem is to find a proper set of graphs T and coefficients v(g) that satisfy (2.8). 
Once we obtain these, the actual simulation goes as follows: Starting from an arbitrary 
initial state n, we first assign a graph g to each plaquette p with the probability (2.9). 
Because of the definition of A, a green edge can be assigned only to a pair of parallel Pauli 
spins whereas a red one can be assigned only to a pair of anti-parallel Pauli spins. When we 
finish this graph assignment for every plaquette, we view the union of all these graphs as 
a single global graph. Then in the next step, i.e., the flipping process, we flip each cluster 
in the global graph with probability 1/2. These two steps, graph assignment and cluster 
flipping, constitute one Monte Carlo step of the cluster algorithm. 

In the next section, we will discuss how we obtain T and the solution v(g) of the equation 
(2.8). 



III. THE DECOMPOSITION OF THE BOLTZMANN FACTOR 

As we discussed in I, the basis for a cluster Monte Carlo algorithm is the decomposition 
of the local Boltzmann weight w(n(p)) into a sum of terms each of which corresponds to 
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a graph, namely, the right hand side of (2.8). In terms of operators, (2.8) is equivalent to 
decomposing the operator p = \Pe A P\ = Pe' A 'P into the following form 

p = J2v(g)A(g) (v(g)>0) (3.1) 

where A(g) is an operator whose matrix elements are A(n(p),g) and |X| stands for the 
operator whose matrix elements are the absolute values of those of X . 

It is useful to consider a set of operators that can be written in the form similar to (3.1). 
We first define B as a set of operators which correspond to graphs, i.e., B = {A(g)\g £ T}. 
Then, we define 0(B) as a set of operators which are linear combinations of elements of B 
with non-negative coefficients. It is obvious that this set is closed also with respect to the 
multiplication by a non-negative real numbers and the addition of two elements. It is closed 
also with respect to the multiplication of two elements. (In other words, we have to choose 
the basis set B so that the set 0(B) is closed.) We can view 0(B) as a subset of the finite 
dimensional linear space spanned by the basis set B. Therefore, we can represent every 
operator in 0(B) as a finite dimensional vector. At the same time, multiplying an operator 
X £ 0(B) by another operator Y £ 0(B) from left can be viewed as some linear operation 
specified by Y applied to an operator (i.e., a vector) X. Therefore, we can represent every 
operator in 0(B) also as a finite dimensional matrix. With these definitions, (3.1) is written 
as 

p = £ v(X)X (v(X) > 0), (3.2) 

XEB 

which gives the vector representation v, whose elements are v(X), for the operator p. Here, 
we used the same symbol v as in (3.1) for the vector elements, since there is one-to-one 
correspondence between graphs in T and the basis operators in B. Note that (2.8), (3.1) 
and (3.2) are equivalent to each other. 

In order to obtain this vector representation for p, we first compute the vector represen- 
tation of |A Mi „| 

|A„,„| = E <W(1)1 (<wW > 0). (3.3) 

xeB 

Since the operator A Mi „ influences only two Pauli spins, cr"^ and cr^ v , the problem is essen- 
tially the same as the 5 = 1/2 problem as far as the solution of (3.3) is concerned. It is 
sufficient to consider graphs defined on only four vertices (k,i,p,), (k,j,v), (k + and 
(k + v) instead of graphs defined on 8S vertices. In other words, the solution of (3.3) is 
given by a direct product of the solution of the 5 = 1/2 problem and the identity operator 
for the dimensions related to neither one of the Pauli spins cr"^ and aj v . Once we obtain 
(3.3), we can easily get the expression 

|A| = E \KA = E <X)X (a(X) > 0), (3.4) 
»< v JteB 

where a(X) = °w(^0- Then, we can calculate the vector representation of (A) n 

(n = 2,3,4, • • •) by operating n — 1 times the matrix representation of A to the vector a. 
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In this way, we can calculate the vector representation of the operator e' A ' up to any finite 
order in |A| by Taylor expanding e' A L Since the radius of convergence is infinite in this case, 
we should be able to obtain a good approximation by truncating the Taylor series at some 
order. In addition, P also belongs to 0(B) and it is easy to obtain its vector and matrix 
representations. In fact, the vector representation is given by a simple formula 

^.^Pf^ (3 ' 5) 

where II is the set of graphs that consist of green vertical edges and have no vertex shared by 
more than one edge. In other words, II is the set of graphs which correspond to permutations 
of vertices. Therefore, the entire process of computing the vector representation of p can be 
done at least numerically. This procedure will be explained again in the next section with 
a concrete example. 
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FIG. 1. Eight possible states of a plaquette for S = 1/2. 



Now, our first problem is how to obtain the coefficient a(g) in (3.3). As we discussed above, 
decomposition of A can be obtained through that of A Mi „ which is essentially an S = 1/2 
problem. Therefore, we consider an S = 1/2 problem with only two Pauli spins for which 
we do not need indices \i or v. Here we consider an operator given by 

A = K + K x a*a* + K y a\a) + K z a\a) . (3.6) 

Since our Hilbert space in the present case is four dimensional, operators can naturally be 
expressed as 4 X 4 matrices. (Note, however, that this matrix representation is different from 
the one mentioned above.) Then, if we define 

ee + K 2 =K -K Z , K 3 = \K X + K y \, and K 4 = \K X - K y \, (3.7) 

the matrix that represents |A| is 
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K 



( K x if 4 \ 

K 2 K 3 

K 3 K 2 

Vif 4 ifi/ 



(3.8) 



We have assumed sufficiently large K so that K\, K 2 > 0. When the problem is mapped to 
a classical problem by the Suzuki-Trotter decomposition, each one of the matrix elements 
corresponds to a Boltzmann weight for a local state of a plaquette. From (3.8), it is obvious 
that only 8 among 16 states can have non-zero Boltzmann weights. These 8 states are shown 
in Fig. 1. If we number these states as shown in Fig. 1, the states a and a correspond to 
the same matrix element K a (a = 1,2,3,4). We should also notice that we can regard the 
matrix in (3.8) as the local weight for an 8-vertex model. Therefore, the graph decomposition 
presented below gives cluster algorithms for the 8-vertex model. 

Now, we consider the local graph that we can use to decompose the Boltzmann operator. 
A local graph partially fixes relative orientations of spins. For example, two spins connected 
by a red bond point to opposite directions. After flipping clusters, wwo spins not connected 
by bonds can have either relative orientation, parallel or anti-parallel. In the case of the 
XX Z model, because of particle number conservation, we could use only six types of local 
graphs, i.e., G^ aT ^ ^ 4) in Fig. 2. This was because if we assigned a graph other than 

these four to a plaquette, the local configuration resulting from flipping a cluster is not 
guaranteed to satisfy particle number conservation. In the present case, instead of particle 
number conservation, the local configuration must satisfy a weaker condition, conservation 
of the parity of the particle number. It can be expressed as 



m b i + m br = m u + m tT mod 2. 



(3.9) 



Here, mj; is the sum X^ m (fc,i,^) where (k,i) is the bottom-left corner of the plaquette. Other 
integers m^m^ and m tr are defined in a similar fashion for the bottom-right, top-left and 
top-right corners, respectively. There are only ten local graphs that do not violate this 
parity conservation after flipping any set of clusters in the graph. These graphs are shown 
in Fig. 2. We denote these graphs as G^ aT ^ = 1)2,3,4) as indicated in Fig. 2. Namely, 
in this case, 



1,2,3,4;<7 < r} and B = {A(G {(TT) )\a, r = 1, 2, 3, 4; a < r}. (3.10) 



In general, the graph G^ crr ) can be assigned only to four states a, t, a and f (two states a 
and a if a = t). After flipping edges in the graph, the resulting state is one of these states. 
In other words, 



(Otherwise) 



(3.11; 



or, in the matrix representation used for (3.8), A(G^), A(G^), A(G^) and A(G( 44 )) 
are represented by 



/l \ 





\ 1 / 



/0 0\ 
10 
10 

\o o o o/ 



/0 0\ 
10 
10 

\o o o o/ 



anc 



/ 1 \ 



V i o o o/ 



(3.12) 
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respectively. Note also that 

A(G {aT) ) = A(G {aa) ) + A(G (tt) ) {cj^t). (3.13) 
Therefore, using (3.8), the equation (3.4) can be simplified as 

if,=E fl H (O(«rr)>0). (3.14) 

T 

Here, a^ aT ^ is the abbreviation for a(A(G^ aT ^)). We have identified (err) with (tct). By 
expressing K a as a column vector K and a^,-) as a matrix A, we can rewrite (3.14) as 

K = Al, (3.15) 

where 1 is the four dimensional vector whose elements are all 1. Our problem is thus reduced 
to a problem of finding the 4x4 matrix A satisfying (3.15) given a four dimensional vector 
K with the constraint that a( ffT ) > and o,itr T ) = a^ TC7 y 

In general, many solutions for this equation exist, although which solution gives the most 
effective algorithm has not been studied extensively. In I, we briefly described a procedure 
based on the maximum entropy method [8] for choosing a feasible solution when we have 
no reason for favoring one of them over the other. In what follows, we will see at least one 
meaningful solution exists for an arbitrary set of parameters, K a . We will also see that if 
and only if the largest among K a 's is not larger than the sum of all others, we can get a 
solution corresponding to a loop algorithm. 




1 y \ 1 

<y * 

FIG. 2. Ten possible graphs of a plaquette for S = 1/2. 
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In what follows, we consider the case K z > and K x K y > 0. In this case, by taking 
sufficiently large K , we have the inequality K\ > K 2 > K 3 > if 4 > because of the 
definition (3.7) of K a . Once we get a solution in this case, we can get a solution in other 
cases as well simply by permuting indices. This is possible because (3.14) is invariant with 
respect to the index permutation. We should also note that the trivial solution 

a( CT0 .) = K a (a = 1,2,3,4), and a aT = (if a ^ t) (3.16) 

does not yield any useful algorithm because in this case all vertices in the system are con- 
nected each other to form a single cluster. In such a case, only two states, the initial state 
and the reversed state, can be realized. It is also argued in I that in general we should min- 
imize the possibility of two vertices being connected. Therefore, in general, we should avoid 
a solution in which a^ aa ^s are unnecessarily large because G^ aa ^ connects all four vertices of 
the plaquette and 'locks' them into a single degree of freedom whereas G^ crr ) connects them 
only pairwise and leaves two degrees of freedom. For example, if we have a solution in which 
a (n) > &(22) > 0, we can get a better solution by increasing fl(i2) by a^ 2 2) and decreasing fl(n) 
and fl(22) by the same amount. (As a result, a^ 2 2) becomes zero.) In other words, we can get 
a better solution by replacing G^ 11 ^ and G^ 22 ^ by G^ 12 \ From this example, it is clear that 
in the best solution, more than one a( CTCr ) cannot be non-zero. 

With these conditions, we now consider the following three cases: 1) K 2 + K 3 + if 4 < K\, 
2) K 2 + K 3 - K 4 < K x < K 2 + K 3 + K 4 , and 3) K x < K 2 + K 3 - K 4 . Equivalently, in terms 
of K x ,K y and K z , 1) \K Z \ is the largest among \K X \, \K y \ and \K Z \, 2) \K Z \ is the second 
largest among three, and 3) \K Z \ is the smallest among three. 

In the case 1), obviously we cannot have a solution in which ffl(n) is zero because 

< K x - K 2 - K 3 - K 4 < a (11) - J2 a (<rr) < O(n)- (3-17) 

cr,r=2,3,4 

This means that a loop algorithm solution does not exist. Considering the remark in the 
last paragraph, we should seek a solution in which fl(22) = a (33) = a (44) = 0. Among such 
solutions, the one that minimizes ffl(n) is 

a (n) = K 1 — K 2 — K 3 — K 4 , (3.18) 
a (lCT) = if CT (for a = 2,3,4), (3.19) 
a {aT) = (for a ^ 1 and r ^ 1). (3.20) 

In the case 2) and 3), as we will see below, solutions exist for which all a^ aa ^s are zero. 
This kind of solution corresponds to a loop algorithm in which the clusters formed do not 
have any branching, i.e., they are merely loops. In what follows, we will consider only such 
solutions for the motivation described above, although other solutions also exist. Given this 
condition, there are only 6 independent variables to be determined with the 4 independent 
equations (3.14). Therefore, the solution space is in general two-dimensional. To be more 
specific, in the matrix form, the general solution of (3.15) for the case 2) and 3) must have 
the following form: 

A = A + uU + vV (3.21) 
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where A is a special solution of (3.15), u and v are real numbers, and U and V are symmetric 
matrices which satisfy Ul = VI = 0. To be specific, 



U 



( -1 1 \ 
-10 10 
10-1 

\i o-io/ 



an> 



d V 



( -1 1 \ 
-10 1 
10 0-1 

V o l-io/ 



(3.22) 



It is easy to see that 

/ 



2 



0, 
2K 2 , 



2K 2 , K x -K 2 + K 3 -K±, K x - K 2 - K 3 + K A \ 
0, 0, 



K 1 -K 2 + K 3 - K 4 , 0, 



0, 



V K 1 - K 2 - K 3 + K 4 , 0, + K 2 + K 3 + K. 



4j 



+ if 2 + K 3 + if 4 





/ 

(3.23) 



satisfies (3.15). The condition > imposes a restriction on the range of (u,v). In the 

case 2), we have 



0<u, 0<v, u + v <!(-#! + K 2 + K 3 + K A ) 



(3.24) 



In the case 3), if we define A x = A + (l/2)(-K 1 +K 2 + K 3 + K 4 )U and u' = u-(l/2)(-K 1 + 
K 2 + K 3 + Ki), the general solution can be written as 



A = A 1 +u'U + vV. 

The condition on (u',v) is 

< it', < v, u' + v<K 4 . 
Thus, we have obtained the whole set of loop algorithm solutions of (3.15). 



(3.25) 



(3.26) 



IV. AN EXAMPLE — THE XY MODEL WITH THE QUANTIZATION AXIS IN 

THE EASY PLANE 

A. The algorithm 

In this section, we discuss how the decomposition of the operator A (3.4) presented in 
the last section is used for constructing a loop algorithm. As an example, we take the XY 
model with the quantization axis lying in the easy plane. This is equivalent to choosing 
Jx = Jz = J > and J y = while we use the conventional representation of Pauli matrices 
in which z-components are diagonal matrices. This representation is useful not only for 
giving an example for what we have discussed but also for some practical purposes. Namely, 
with this representation, we can easily calculate the correlations between spin-components 
in the easy-plane, i.e., z-components in the present case. Since the singularity due to the 
Kosterlitz-Thouless transition is manifested most strongly in such correlations, taking this 
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representation is advantageous. On the other hand, we have to develop an algorithm different 
from the one for the 6-vertex model [5], because with this representation the system no longer 
maps to a 6-vertex model even in the S = 1/2 case. 

Obviously, the XY model is a marginal case that belongs to both case 1) and case 2) 
described in the last section. The solution is given by 

a (12) = K 2 = Ko-K z = K -K, 
a ( i3) = K 3 = \K X + K y \ = K, 
a(i4) = K 4 = \K X — K y \ = K, 

a {aT) = (if (<tt) = (11), or, a ^ 1 and r ^ 1 ). (4.1) 



Namely, 



|A| = (K - K)A 2 + KA 3 + K A 4 , (4.2) 



where A CT is an abbreviation for A(G^ lcr )). Note that the linear space spanned by A 2 , A 3 
and A 4 is closed with respect to the multiplication. Therefore, as the basis set B, we can 
take {A 2 , A 3 , A 4 }. To be more specific, 

A 2 A 2 = A 2 , A 2 A 3 = A 3 , A 2 A 4 = A 4 , 

A 3 A 2 = A 3 , A 3 A 3 = A 2 , A 3 A 4 = A 4 , (4.3) 

A 4 A 2 = A 4 , A 4 A 3 = A 4 , A 4 A 4 = 2A 4 . 

Note that 

A CT A r = A r A CT . (4.4) 

Using (4.3) and (4.4), we have 

e A = e (ifo-if)A 2e xA 3e ifA 4 

= [e^ K °- K ^A 2 ][coshK A 2 + sinhif A 3 ][A 2 + l -{e 2K - 1)A 4 ] 

= e Ko (e- K cosh K A 2 + e~ K sinh K A 3 e K sinh K A 4 ). (4.5) 

This means that 

v (i2) = e ~ K cosh K, v (i3) = e ~ K smn K ) f(i 4 ) = sinh if (4-6) 

in the equation (2.8). (We omitted e K ° since it does not affect the resulting algorithm at 
all.) Therefore, the equation (2.9) leads to 

p((ar)|r) = p(M|f) = -^- > (4.7) 

1^*' V {<t't) 

where p((ct)\() stands for p(G^ aT ^\(). More explicitly, 

K(12)|l) = e" 2K , K(13)|l) = e- 2K tanhif, p((14)|l) = tanhtf, 
K(12)|2) = 1, p((13)|3) = 1, p((14)|4) = 1, 
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p((ar)\() = (if i± a and ( ^ r) 



(4.9) 



and 

p{{ar)\() = p{{ar)\(). (4.10) 

For the systems with spin S larger than 1/2, we can follow essentially the same line 
to obtain the labeling probability. In other words, first we find a set of basis operators B 
that spans a linear space closed with respect to the multiplication, then calculate the vector 
representation of e' A ' in terms of these basis operators. However, we cannot in general 
simplify this calculation as we did in the case of S = 1/2, because commutativity (4.4) does 
not hold for the basis operators in the case of S > 1/2. We can still at least numerically 
calculate the expansion of e' A ' as discussed in I by Taylor series expanding e' A ' in terms of 
|A|. Since the radius of convergence circle of this Taylor expansion is infinity, we should be 
able to get a good approximation if we truncate the series at the finite but sufficiently high 
order. 

For example, in the case where S = 1, J x = J z = J > and J y = 0, similar to 
S = 1/2 case, only graphs we have to take into account are those which connect vertices 
pair- wise. Therefore, there are 8!/(2 4 4!) = 105 distinct graphs to consider. In other words, 
105 operators that corresponds to these graphs constitute the basis set B. The product of 
arbitrary two basis operators is another basis operator except for the numerical factor 2 m 
due to inner closed loops as discussed in I. Namely, for arbitrary two elements X and Y of 
B, another element Z(X,Y) of B and an integer m(X,Y) exist that satisfy 

XY = 2 m ^' f) Z(X,Y). (4.11) 

It is tedious but straight-forward to calculate m(X, Y) and Z(X, Y) for all possible pairs of 
X and Y and prepare a table similar to (4.3). Once we have this table, we can calculate the 
product of two arbitrary operators in the linear space 0{B), easily. To be specific, for two 
operators in 0(B), S = Y^xeB S {X)X and T = Y^xeB t{X)X, we have 

ST=J2 s(X)t(Y)XY = s(X)t(Y)2 m ^'^Z(X, Y) = J>W^ ( 4 - 12 ) 

X,Y X,Y X 

where 

u(W)= J2 s(X)t(Y)2 m ^' f \ (4.13) 

X,Y 
W=Z(X,Y) 

In other words, multiplying T by S from left is equivalent to multiplying a vector defined 

by 

(T) x = t{X) (4.14) 

by a matrix 

{Sh,f= E s(W)2™^ (4.15) 

w 

X=Z(W,Y) 
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from left. Therefore, since P and |A| is an element of 0(B), calculating the vector represen- 
tation of P\A\ n P in terms of the basis operators is computationally straight-forward. Hence, 
Pe' A 'P can be calculated at least numerically. In fact, the calculation described here can be 
simplified significantly if we take symmetry into account. 

There is yet another way of calculating Pe' A 'P. Namely, explicitely solving the linear 
algebraic equation (2.8) with respect to v(g). It is obvious the solution obtained by the 
former method is unique and satisfies (2.8), although the solution of (2.8) is not necessarily 
unique. 



B. Comparison of the conventional algorithm and the loop algorithm 

We applied the loop algorithm described in the last subsection to the Xl^-model on 
a square lattice with the periodic boundary condition. At the same time, we applied the 
conventional algorithm to the same system to compare the efficiency of the two algorithms. 
Ding and Makivic [9] reported that this system undergoes a Kosterlitz-Thouless type phase 
transition at T ~ 0.35. Therefore, we expect critical-slowing down for the conventional algo- 
rithm. We also expect another slowing-down as the imaginary time spacing becomes smaller 
with a fixed temperature as it happened [4] in the one-dimensional S = 1 antiferromagnetic 
Heisenberg model. 

The details of the conventional algorithm used in this paper is presented in Appendix. 
The conventional algorithm is the same as the one used in [10] except that we included 
"diagonal" flips to make the simulation ergodic. (As we will see in the Appendix, the 
algorithm in [10] is not completely ergodic although the effect of this non-ergodicity may not 
be significant.) We remark here that one usually needs to be concerned about the ergodicity 
in the conventional algorithm and that the actual computer programs for the conventional 
algorithm tend to be complicated because to ensure ergodicity one has to incorporate several 
different updates in a non-unified fashion. If we have four different kinds of flips, we usually 
write four different subroutines. On the other hand, the cluster algorithm is less likely non- 
ergodic because in most cases the cluster algorithm includes wider class of updates than 
the conventional algorithm. As discussed in I, we can even prove ergodicity for the cluster 
algorithm in some cases. In addition, in a cluster algorithm, all kinds of updates can be 
realized in a unified fashion in actual computer programs. 

We calculated the integrated auto-correlation time defined [11] by 

where vx(b) is the variance of the distribution of the bin averages with the bin length of b. 
This quantity should be equal to the integrated auto-correlation time defined by 

oo 

4 nt) = E <^)*(0))mc / {X(0)X(0)) MC (4.17) 

t=o 

in the limit of b — > oo. Here, X(t) is an arbitrary physical quantity measured at the t-th 
Monte Carlo step. As a function of increasing b, Tx(b) is generally a non-decreasing function. 
Upto a certain point, say, b ~ Ty° r \ Tx(b) increases and after this point, roughly speaking, it 
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takes the constant value . In a typical simulation that we have done, 4 intJ < T^° r> . We 
regard Ty Cor ^ as the number of Monte Carlo steps needed for decorrelating two measurements 
of X completely. In order to obtain an estimate of Tx(b) with a small statistical error, we 
had to perform a simulation much longer than b, in general. Therefore, in case the total 
number of Monte Carlo steps T is larger but not much larger than T^° r \ it is difficult to 
judge if the function Tx(b) has reached the plateau because the plateau is blurred by large 
statistical errors. Hence, we often cannot estimate Ty"^ precisely. In such cases, we took 
Tx{b) at the largest bin length b where a statistically precise estimate is still possible and 
regarded it as a lower bound of Ty"^ = tx(oo). We calculated Tx(b) with magnetization 
(M = J2rSz{R)), susceptibility (M 2 /N), and nearest-neighbor spin-spin correlations 
S a {R)S a {R + 6)/N (a = x,y,z)). 

As for the conventional algorithm, we found that the autocorrelation time of the nearest- 
neighbor spin-spin correlations for the in-plane spin components (i.e., x and z components) 
is equal to or larger than that for the susceptibility. On the other hand, the auto-correlation 
time for the y components is smaller than that for the susceptibility in most cases we studied. 
In Table I, we show the autocorrelation times of the uniform magnetization and the magnetic 
susceptibility for the conventional algorithm. 



At 


P 


Magnetization 


Susceptibility 


L = 4 


L = 8 


L = 16 


L = 4 


L = 8 


L = 16 


1.000 


1 


0.534(11) 


0.499(07) 


0.511(05) 


0.550(10) 


0.504(06) 


0.498(06) 


1.000 


2 


10.38(60) 


27.8(17) 


49.7(LB) 


5.67(51) 


9.60(68) 


20.3(LB) 


1.000 


4 


32.7(26) 


139.6(54) 


496(LB) 


16.8(11) 


51.3(24) 


242(LB) 


0.500 


1 


0.768(23) 


0.700(31) 




0.741(32) 


0.647(37) 




0.500 


2 


10.6(15) 


24.8(14) 




5.68(59) 


12.1(11) 




0.500 


4 


51.1(LB) 






26.6(16) 






0.250 


1 


0.745(30) 


0.678(20) 




0.757(09) 


0.633(24) 




0.250 


2 


13.26(36) 


28.70(86) 




10.89(22) 


35.8(33) 




0.250 


4 


115.4(52) 






53.2(24) 






0.125 


1 


0.727(08) 


0.725(25) 




1.079(62) 


0.662(10) 




0.125 


2 


14.93(67) 


30.7(23) 




30.6(11) 


115(LB) 




0.125 


4 


231(LB) 






144.0(84) 







TABLE I. The integrated autocorrelation times of the conventional algorithm for the uniform 
magnetization and the susceptibility. The system size is L X L. The figures in the parentheses 
are statistical errors (one standard deviation). "LB" indicates that the value shown is the lower 
bound. 
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As we can clearly see, the correlation times for both the quantities show the slowing down 
as the temperature becomes low. It is also clear that for lower temperatures (/3 = 2,4), the 
autocorrelation time grows fast as the system becomes larger. We consider this growth a 
finite size effect for (3 = 2 since this temperature is higher than the critical temperature 
Act ~ 2.9. Since /3 = 4 is below the critical temperature, we would observe the growth 
of the correlation time for larger systems. The slowing-down due to the increment of the 
Trotter number is also observed. But, it is significant only in the case of /3 = 2 and 4. On 
the other hand, we observed no clear evidence for any slowing-down in the case of the loop 
algorithm. The autocorrelation times for the magnetization are 0.5, as they should be, with 
statistical errors of a few percents. The autocorrelation times for the susceptibility are also 
almost constant and are around 1.0 or less (See Table II). 

We should stress that the cluster algorithm has another significant advantage besides 
the reduction of the autocorrelation times, namely, the improved estimators [1,13]. In this 
paper, we calculated susceptibility by using an improved estimator. It is well-known that for 
the Ising model the improved estimator for the magntic susceptibility is simply the average 
cluster size. In the present case, too, the improved estimator of the magnetic susceptibility 
for z-components is proportional to the average cluster size. To be more specific, 

<^ 2 ) = ^(E^). (4-18) 



At 


P 


L = 4 


L = 8 


L = 16 


1.000 


1 


0.883(29) 


0.661(23) 


0.529(12) 


1.000 


2 


1.004(37) 


1.093(44) 


1.251(84) 


1.000 


4 


0.919(29) 


0.952(64) 


0.978(30) 


0.500 


1 


0.901(40) 


0.589(23) 




0.500 


2 


1.033(37) 


1.077(61) 




0.500 


4 


0.920(30) 






0.250 


1 


0.772(25) 


0.603(15) 




0.250 


2 


0.973(40) 


1.085(40) 




0.250 


4 


0.986(32) 






0.125 


1 


0.800(17) 


0.629(22) 




0.125 


2 


0.991(29) 


1.138(50) 




0.125 


4 


0.968(26) 







TABLE II. The integrated autocorrelation times of the loop algorithm for the susceptibility. 
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p 


X 


Algorithm 


Nmcs 


N int 


{Ml) IN 


Error 


1 


4 


C 


8,192 


1 


3.750 


0.011 


1 


4 


L 


8,192 


1 


3.762 


0.018 


1 


4 


I 


8,192 


1 


3.781 


0.008 


1 


8 


C 


8,192 


1 


4.207 


0.021 


1 


8 


L 


8,192 


1 


4.196 


0.016 


1 


8 


I 


8,192 


1 


4.193 


0.007 


1 


16 


C 


8,192 


1 


4.214 


0.019 


1 


16 


L 


8,192 


1 


4.205 


0.018 


1 


16 


I 


8,192 


1 


4.202 


0.004 


2 


4 


C 


32,768 


4 


7.720 


0.030 


2 


4 


L 


8,192 


1 


7.726 


0.024 


2 


4 


I 


8,192 


1 


7.712 


0.012 


2 


8 


C 


32,768 


2 


22.779 


0.158 


2 


8 


L 


8,192 


1 


22.621 


0.086 


2 


8 


I 


8,192 


1 


22.630 


0.055 


2 


16 


C 


32,768 


4 


57.588 


0.473 


2 


16 


L 


8,192 


1 


57.135 


0.265 


2 


16 


I 


8,192 


1 


57.075 


0.153 


4 


4 


C 


32,768 


2 


7.931 


0.059 


4 


4 


L 


8,192 


1 


7.927 


0.027 


4 


4 


I 


8,192 


1 


7.927 


0.014 


4 


8 


c 


262 144 


16 


28.534 


0.148 


4 


8 


L 


8,192 


1 


28.866 


0.091 


4 


8 


I 


8,192 


1 


28.817 


0.050 


4 


16 


C 


262,144 


32 


103.170 


1.404 


4 


16 


L 


8,192 


1 


103.161 


0.362 


4 


16 


I 


8,192 


1 


103.334 


0.225 



TABLE III. The estimated values for the susceptibility and the statistical error in the case of 
At = 1. In each entry, the top, middle and bottom figures are the estimates by the conventional 
algorithm (C), the loop algorithm (L), and the loop algorithm with the improved estimator (I), 
respectively. Each simulation consists of 10 sets where one set consists of Nmcs Monte Carlo steps 
for measurements with a sufficiently large number of additional Monte Carlo steps for equillibration. 
A measurement of susceptibility is done every Ni nt Monte Carlo steps. The last column is the 
estimate of statistical error in one standard-deviation. 
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In Table III, we listed the three sets of estimates for the susceptibility, i.e., the ones 
obtained with the conventional algorithm, the ones with the loop algorithm, and the ones 
with the loop algorithm and the improved estimator. For smaller values of At, the mag- 
nitude of the statistical errors for the conventional algorithm relative to that for the loop 
algorithm without the improved estimator tends to be larger. We can see that the difference 
between the statistical errors for the conventional algorithm and those for the loop algorithm 
without the improved estimator is consistent with the estimated integrated autocorrelation 
times shown in Table I and II. We also see that the improved estimator further reduces the 
statistical error considerably. Since using the improved estimator is equivalent to averaging 
over 2 Nc different configurations where N c is the number of clusters, its advantage is more 
significant when N c is larger, i.e., at higher tempereratures. On the other hand, the reduc- 
tion of the auto-correlation time by using the loop algorithm is less significant at higher 
temperatures. Therefore, the two advantages of the loop algorithm, i.e., the reduction in 
the correlation times and the reduction in the variance of the thermodynamic distribution 
by the improved estimators, are complementary to each other. 

The difference in the overall efficiency is striking. For example, in the case of At = 1.0, 
/3 = 4 and L = 16, the simulation by the conventional algorithm is 32 times longer than the 
loop algorithm simulation. However, the conventional algorithm yields a statistical error 
6 times larger than that of the loop algorithm simulation with the improved estimator. 
Since the statistical error is proportional to the reciprocal of the square-root of the total 
number of the Monte Carlo steps, this difference in the efficiency is roughly equivalent to a 
factor 32 X 6 2 /r ~ 10 3 /r in terms of the computational time where r is the computaional 
time per one Monte Carlo step of the loop algorithm devided by that of the conventional 
algorithm. The factor r strongly depends on the detail of the software and the hardware. 
In our particular case, 2 < r < 4. Therefore, even for the small systems studied here, the 
difference is more than two orders of magnitude in the real CPU time. 

V. CONCLUSIONS 

In this paper, we presented how to construct a cluster algorithm for a model described 
by the XYZ Hamiltonian. The algorithm for the special case of S = 1/2 can be viewed 
as a cluster algorithm for the 8-vertex model as well as the algorithm for the quantum spin 
systems. The efficiency of the new algorithm is examined for the S = 1/2 XY model on a 
square lattice. It is found that the integrated autocorrelation times of the new algorithm 
for various physical quantities do not show any significant slowing-down whereas the con- 
ventional algorithm suffers from slowing-down due to the low temperature and the small 
imaginary time spacing. Even for the small system sizes (L < 16) studied in this paper, the 
difference between the autocorrelation times for the two algorithms can be three orders of 
magnitude, and this difference is very likely much larger for larger systems. In addition, we 
observed that we can further reduce the statistical error by measuring quantities through 
improved estimators. 



Acknowledgment 



17 



The auther would like to thank J. E. Gubernatis for useful comments and critical reading 
of the manuscript. 



APPENDIX: THE CONVENTIONAL ALGORITHM 

The conventional algorithm used in this paper is the same as the one in [10] except that 
we added so-called "diagonal" loop flips. The algorithm in [10] consists of three types of 
updates; 1) local "space" flips, 2) local "time" flips and 3) global flips in the time direction. 
These flips, except for the diagonal flips, are described in [12]. It seems that in the simulation 
described in [10], the second global flip in [12], i.e., global flips in the space directions were 
not performed. In the case of the Heisenberg model, these global spatial flips are needed to 
make the algorithm ergodic. In other words, the other three types of updates do not change 
the total winding number of the world lines. Therefore, the simulation without the spatial 
global flips is not ergodic. In the case of the XY Z model, the local conservation rule of 
the magnetization does not hold. It means that the worldlines are no longer well-defined. 
Therefore, it is not straight-forward to see if there are conserved quantities. In fact, however, 
conserved quantities exist if we do not include the spatial global flips or something equivalent 
to it. To see this, let us cut the system by a plane parallel to the y and t axes and consider 
the cross-section. Then, we take a half of the lattice points on this cross-section whose time 
coordinates k are odd (see Fig. 3). 



k=1 o- 



k=8 



k=7 9 <: f K j H T44 K fy-^ 

k=6 \S \J W \S W 



k=5 o 6- 



k=4 



k=3 



k=2 



< i^— ( i— — < i^— < — -( ^ 



k=1 



■< >< ■< >< 

ii ii ii ii 
ro co ^ — *■ 



FIG. 3. A cross-section by a plane parallel to the j/z-plane. The set H is 
denned as the set of lattice points marked by open circles in the figure. 
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We refer to the set of these lattice points as H . Now, we define P x as the parity of the 
total number of lattice points in H on which the spins are 'up' (i.e., n^,i) = 1)- The P x 
does not depend on the location of the plane to cut the system. We claim that P x is not 
changed by local spatial flips, local temporal flips, or global temporal flips. The reason is 
simply that the intersection of H and the set of lattice points that are affected by one of 
those flips always consists of an even number of lattice points. Since we can define P y in 
a similar fashion, our claim implies that there are at least two conserved numbers for the 
whole system in 2+1 dimensional systems. 

Of course, the above definition of P x and P y is valid also for XX Z models for which the 
local conservation rule for the magnetization applies. It is easy to see that P a (a = x,y) 
equals the parity of the total winding number in the a-direction in the case of the XX Z 
model. The above argument can be easily generalized to other dimensions and different 
Suzuki-Trotter decompositions by changing the definition of H appropriately. 

In this paper, we used global diagonal flips instead of the global spatial flips to make the 
algorithm ergodic. To be specific, a diagonal flip in the x-direction is a flip of a loop which 
is constructed by the following rules. 1) Take a lattice point (x,y,t) where t = 1,2, • • • ,M 
specifies the imaginary time coordinate. 2) If (x,y,t), (x + l,y,t), (x + l,y,t + 1) and 
(x,y,t-\- 1) are four corners of a "shaded" plaquette, take (x + l,y,t + l) as the next point. 
Otherwise, take {x,y,t + 1) instead. 3) Repeat 2) until the current x-coordinate coincides 
with the x-coordinate of the starting point. 4) Take (x, y, t + 1) as the next point. 5) Repeat 
4) until the current point coincides with the starting point. In a similar fashion, we can 
define diagonal flips in the y-direction. In the actual simulations, for every Monte Carlo 
step, we included one set of diagonal flips in both x and y directions. Here, 'one set' of 
diagonal flips in the x-direction means the updates of all diagonal loops in the x-direction 
whose starting points are given by (0, y, t) (y = 1, 2, • • • , L; t = 4, 8, 12, • • • , M). (Note that 
in the present case, the hyper-lattice is periodic with the period of 4 in the imaginary time 
direction.) 
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